153 research outputs found

    A Provable Smoothing Approach for High Dimensional Generalized Regression with Applications in Genomics

    Get PDF
    In many applications, linear models fit the data poorly. This article studies an appealing alternative, the generalized regression model. This model only assumes that there exists an unknown monotonically increasing link function connecting the response YY to a single index XTβX^T\beta^* of explanatory variables XRdX\in\mathbb{R}^d. The generalized regression model is flexible and covers many widely used statistical models. It fits the data generating mechanisms well in many real problems, which makes it useful in a variety of applications where regression models are regularly employed. In low dimensions, rank-based M-estimators are recommended to deal with the generalized regression model, giving root-nn consistent estimators of β\beta^*. Applications of these estimators to high dimensional data, however, are questionable. This article studies, both theoretically and practically, a simple yet powerful smoothing approach to handle the high dimensional generalized regression model. Theoretically, a family of smoothing functions is provided, and the amount of smoothing necessary for efficient inference is carefully calculated. Practically, our study is motivated by an important and challenging scientific problem: decoding gene regulation by predicting transcription factors that bind to cis-regulatory elements. Applying our proposed method to this problem shows substantial improvement over the state-of-the-art alternative in real data.Comment: 53 page

    Least Squares Based and Two-Stage Least Squares Based Iterative Estimation Algorithms for H-FIR-MA Systems

    Get PDF
    This paper studies the identification of Hammerstein finite impulse response moving average (H-FIR-MA for short) systems. A new two-stage least squares iterative algorithm is developed to identify the parameters of the H-FIR-MA systems. The simulation cases indicate the efficiency of the proposed algorithms

    STATISTICAL METHODS FOR DECODING GENE REGULATION IN SINGLE CELLS

    Get PDF
    Single-cell sequencing is rapidly transforming biomedical research. With the ability to measure omics information in individual cells, it provides unprecedented resolution to study heterogeneous biological and clinical samples, enabling scientists to discover and characterize previously unknown biological signals and processes carried by novel or rare cell subpopulations. The new data structure and high level of noise in the single-cell genomic data pose significant analytical challenges. To address these challenges, we developed new statistical and computational methods for analyzing single-cell transcriptome and regulome data. First, to infer cells’ underlying developmental trajectories, we developed TSCAN that performs “pseudotime” analysis with a cluster-based minimum spanning tree approach. TSCAN facilitates accurate construction of pseudotemporal trajectories by regularizing the complexity of spanning trees. By improving the bias-variance tradeoff of the spanning tree estimation, TSCAN substantially improved the accuracy and robustness of the pseudotime analysis. Second, we developed RAISIN to support regression and differential analysis in single-cell RNA-seq datasets with multiple samples. Compared to classical linear mixed effects model, RAISIN improves variance estimate and statistical power for datasets with small sample size or cell number, and improves scalability for datasets with large sample size and millions of cells. Third, we developed SCATE to extract and enhance signals from the highly noisy and sparse single-cell ATAC-seq data. SCATE accurately infers genome-wide activities of each individual cis-regulatory element by adaptively integrating information from co-activated cis-regulatory elements, similar cells, and massive amounts of publicly available regulome data. The enhanced signal improves the performance of downstream analyses such as peak calling and prediction of transcription factor binding sites. These methods have been applied in numerous collaborative projects and helped decipher gene regulatory programs in T cell exhaustion process and identify molecular signatures in neoadjuvant immunotherapy

    Word Representation with Salient Features

    Get PDF

    Stacking tunable interlayer magnetism in bilayer CrI3

    Full text link
    Diverse interlayer tunability of physical properties of two-dimensional layers mostly lies in the covalent-like quasi-bonding that is significant in electronic structures but rather weak for energetics. Such characteristics result in various stacking orders that are energetically comparable but may significantly differ in terms of electronic structures, e.g. magnetism. Inspired by several recent experiments showing interlayer anti-ferromagnetically coupled CrI3 bilayers, we carried out first-principles calculations for CrI3 bilayers. We found that the anti-ferromagnetic coupling results from a new stacking order with the C2/m space group symmetry, rather than the graphene-like one with R3 as previously believed. Moreover, we demonstrated that the intra- and inter-layer couplings in CrI3 bilayer are governed by two different mechanisms, namely ferromagnetic super-exchange and direct-exchange interactions, which are largely decoupled because of their significant difference in strength at the strong- and weak-interaction limits. This allows the much weaker interlayer magnetic coupling to be more feasibly tuned by stacking orders solely. Given the fact that interlayer magnetic properties can be altered by changing crystal structure with different stacking orders, our work opens a new paradigm for tuning interlayer magnetic properties with the freedom of stacking order in two dimensional layered materials

    Geochemical Composition Variations and Tectonic Implications of the Baoligaomiao Formation Volcanic Rocks from the Uliastai Continental Margin, Southeast Central Asian Orogenic Belt

    Get PDF
    The Permo-Carboniferous tectonic evolution in the Uliastai continental margin (UCM), north of the southeast central Asian Orogenic Belt, remains controversial. This work examined the geochemical composition of the felsic volcanic rocks from the lower and upper part of the Baoligaomiao Formation in the UCM. Zircon U-Pb ages reveal that the Baoligaomiao Formation has a long-lived eruption duration, from ca. 285 to 328 Ma. The lower part (ca. 328–310 Ma) of the Baoligaomiao Formation is dominated by clastic and pyroclastic rocks with subordinate intermediate-felsic volcanic rocks, whereas the upper part (ca. 307–285 Ma) mainly consists of felsic volcanic rocks and pyroclastic rocks. Calculations reveal that the felsic volcanic rocks from the lower part have low zircon saturation temperatures (TZr = 747℃–795℃), whereas those from the upper part exhibit high TZr (ca. 793℃–930℃). Zircons from the lower part exhibit high εHf(t) values and 176Lu/177Hf ratios, in contrast to the low εHf(t) values and 176Lu/177Hf ratios of zircons from the upper part. Those petrogeological and geochemical shifts might support the tectonic switch model in the UCM at the end of the Carboniferous, providing new constraints on the Late Carboniferous closure of the Hegenshan Ocean
    corecore